Regular, median and Huber cross-validation: A computational comparison
نویسندگان
چکیده
We present a new technique for comparing models using a median form of cross-validation and least median of squares estimation (MCV-LMS). Rather than minimizing the sums of squares of residual errors, we minimize the median of the squared residual errors. We compare this with a robustified form of cross-validation using the Huber loss function and robust coefficient estimators (HCV). Through extensive simulations we find that for linear models MCV-LMS outperforms HCV for data that is representative of the data generator when the tails of the noise distribution are heavy enough and asymmetric enough. We also find that MCV-LMS is often better able to detect the presence of small terms. Otherwise, HCV typically outperforms MCV-LMS for ‘good’ data. MCV-LMS also outperforms HCV in the presence of enough severe outliers. One of MCV and HCV also generally gives better model selection for linear models than the conventional version of crossvalidation with least squares estimators (CV-LS) when the tails of the noise distribution are heavy or asymmetric or when the coefficients are small and the data is representative. CV-LS only performs well when the tails of the error distribution are light and symmetric and the coefficients are large relative to the noise variance. Outside of these contexts and the contexts noted above, HCV outperforms CV-LS and MCV-LMS. We illustrate CV-LS, HVC, and MCV-LMS via numerous simulations to map out when each does best on representative data and then apply all three to a real dataset from econometrics that includes outliers. © 2015 The Authors. Statistical Analysis and Data Mining published by Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2015
منابع مشابه
Transcript Mapping with High-Density Tiling Arrays
M Barnes, J Freudenberg, S Thompson, et al. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res, 33:5914–5923, 2005. Mark Dunning, Mike Smith, Natalie Thorne and Simon Tavaré Computational Biology Group Hutchison / MRC Research Centre Department of Oncology University of Cambridge Hills Rd, Cambridge CB2 2XZ United Ki...
متن کاملComparison of Heat Transfer Power of the Cooling Panel with Square Cross Section and Circular Cross Section in Electric Arc Furnaces Steelmaking by the use of Computational Fluid Dynamics
Steel pipes with circular cross-sections are usually used in cooling panels of electric arc furnaces. In the present study pipes with square cross-sections under equivalent conditions were used to obtain more information on the possibility of increasing the heat transfer and cooling efficiency. The results showed increased efficiency of the square pipe compared to the circular cross-section pip...
متن کاملDominance in hands and cross-sectional area of median nerve in carpal tunnel syndrome
Introduction: Currently, neuroresearchers report that the median nerve shows severity-correlated intracarpal enlargement in idiopathic carpal tunnel syndrome (CTS) as a most common peripheral neuropathic disorder. The purpose of this paper was to investigate ultrasonography morphological findings in patients with idiopathic CTS and comparing some physical properties such as age, gender, BMI wit...
متن کاملRelaxed Lasso
The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with an efficient computational procedure. However, the rate of convergence of the Lasso is slow for some sparse high dimensional data, where the number of predictor variables is growing fast with the number of observations. Moreover, many noise variables are selected if the estimato...
متن کاملExperimental Efforts for Predictive Computational Fluid Dynamics Validation
Ideally, Validation and Verification methods to be used to deliver Predictive Computational Fluid Dynamics (P-CFD) methods for design use, must utilize a consistent set of detailed input to perform an output response comparison to associated data that can accurately characterize the uncertainty in the calculated results. In this paper, some experimental efforts are described and discussed in re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistical Analysis and Data Mining
دوره 8 شماره
صفحات -
تاریخ انتشار 2015